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Abstract 

The metabolome is a highly dynamic entity and the final readout of the genotype x environment x phenotype (GxExP) 
relationship of an organism. Monitoring metabolite dynamics over time thus theoretically encrypts the whole range of 
possible chemical and biochemical transformations of small molecules involved in metabolism. The bottleneck is, however, 
the sheer number of unidentified structures in these samples. This represents the next challenge for metabolomics 
technology and is comparable with genome sequencing 30 years ago. At the same time it is impossible to handle the 
amount of data involved in a metabolomics analysis manually. Algorithms are therefore imperative to allow for automated 
m/z feature extraction and subsequent structure or pathway assignment. Here we provide an automated pathway inference 
strategy comprising measurements of metabolome time series using LC- MS with high resolution and high mass accuracy. 
An algorithm was developed, called mzCroup Analyzer, to automatically explore the metabolome for the detection of 
metabolite transformations caused by biochemical or chemical modifications. Pathways are extracted directly from the data 
and putative novel structures can be identified. The detected m/z features can be mapped on a van Krevelen diagram 
according to their H/C and O/C ratios for pattern recognition and to visualize oxidative processes and biochemical 
transformations. This method was applied to Arabidopsis thaliana treated simultaneously with cold and high light. Due to a 
protective antioxidant response the plants turn from green to purple color via the accumulation of flavonoid structures. The 
detection of potential biochemical pathways resulted in 15 putatively new compounds involved in the flavonoid-pathway. 
These compounds were further validated by product ion spectra from the same data. The mzCroupAnalyzer is implemented 
in the graphical user interface (GUI) of the metabolomics toolbox COVAIN (Sun & Weckwerth, 2012, Metabolomics 8: 81-93). 
The strategy can be extended to any biological system. 
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Introduction 

Metabolomic techniques have been recently established and 
refined to characterize the widely heterogeneous small molecules 
present at a specific time in a living tissue. Several analytical 
approaches exist for the application of metabolomics to various 
biological questions, for instance gas chromatography coupled to 
mass spectrometry (GC-MS) for the detection of small and volatile 
compounds [1] or capillary electrophoresis coupled to mass 
spectrometry (CE-MS) for charged compounds [2]. The method 
with the highest preference for larger and more hydrophobic 
metabolites and complementary to GC-MS is liquid chromatog- 
raphy coupled to mass spectrometry (LC-MS) [3,4]. Recendy, we 
showed that the GC-MS and LC-MS techniques can be integrated 
into a combined platform to increase the total coverage of the 
metabolome, as well as to provide insights into the mutual 
regulation of both primary and secondary metabolism by analysis 
of the same sample and subsequent data merging and processing 
[4-6]. Metabolomics-via-LC-MS approaches have yet to hit the 



ranks of other analytical techniques with respect to their 
robustness and database availability, but no other method has 
the potential to achieve a better coverage of the metabolome whilst 
maintaining high resolution and inferring additional structural 
information in the process. In contrast to metabolite profiling on 
the GC-MS platform, where standard operating procedures and 
large databases are present and being improved continuously, 
there exists little standardization on how to approach the analysis 
of larger metabolites using LC-MS [7] . A significant step forward 
in the field of untargeted metabolomics is the advent of 
instruments capable of sub-ppm mass accuracy measurements as 
well as precursor fragmentation, enabling the acquisition of the 
exact mass as well as obtaining molecule fragment information in 
order to construct a meaningful sum formula [8,9]. Owing to these 
technological advances, metabolomics has already proven to be a 
valuable tool in fields like biomarker discovery and functional 
genomics [10-12]. In this study, we introduce an algorithm called 
mzGroup Analyzer to provide characterization and identification of 
data signals acquired by an LC-Orbitrap-FT-MS system utilizing 
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sub-ppm mass measurements and intelligent sum formula query. 
We applied this strategy to plant secondary metabolism. Plants are 
most important resources for natural products, providing the 
highest diversity of chemical structures in the range of 200.000- 
400.000 different compounds and a high in vivo plasticity in 
response to environmental conditions. Due to this diversity of 
chemical structures ranging from simple hydrocarbons to complex 
heteroatomic molecules, the elucidation of the metabolome of 
higher plants and in general of natural products of any origin poses 
a challenge for metabolomic techniques. Traditionally, the 
metabolism of higher plants is subclassified into the primary 
metabolism, which comprises molecules with a low molecular 
weight that are involved in the central energy conversion cycles of 
the organism, and the secondary metabolism, which is not per se 
involved in energy homeostasis but rather involved in the chemical 
communication with the environment. Only recently, the field of 
plant secondary metabolites has begun moving more and more 
into the focus of new bio-analytical techniques and their 
stereotypic role as simple chemical weapons is being revised as 
numerous findings indicate their ability to stimulate vital processes 
in the cell by regulating the concentration of reactive oxygen 
species in vivo [13,14]. Also, secondary metabolites are of vast 
interest to the area of medicine and nutrition, as these 
phytochemicals often possess physiological activity in the human 
body, for example antioxidant activity or cancer chemoprevention 
[15]. 

To provide a suitable experimental system, we stressed 
Arabidopsis thaliana plants by parallel cold and light treatment 
introducing high oxidative stress over a 3-week interval. This led to 
a comprehensive switch of the vegetative growth metabolism to 
protective accumulation of secondary metabolites. We show a way 
of embedding the multitude of signals into a biochemical context 
through automated detection of metabolic steps between the 
acquired mlz features. Putative structures are inferred by analysis 
of the product ions from the same data. By applying mzGroupA- 
nalyzer to LC-MS raw data, we are able to prove the existence and 
relation of known molecules as well as propose novel compounds 
and pathways, in the present case molecules arising during 
oxidative stress within the secondary metabolism of Arabidopsis 
thaliana. This strategy can be applied systematically and conve- 
niendy to any kind of LC-MS data set and is expected to improve 
identification and structural elucidation in complex metabolomics 
data, which is currently the limiting step in large-scale metabo- 
lomics studies. 

Materials and Methods 

Plant material and harvest 

Arabidopsis thaliana Col-0 was cultivated in a growth chamber 
under controlled conditions: light intensity was 280 |J.mol m -2 s _1 
in an 8-hour light/ 1 6-hour dark day cycle; relative humidity was 
60% with an average temperature of 22°C. Time point "zero" of 
cold stress consisted of replicates of non-stressed plants, while 
every 2 days another sample batch of cold-acclimated plants in a 
4°C cold chamber was taken. Rosette leaves were harvested 
approximately 2 hours after the beginning of the light period. 
Metabolic activity in the leaves was quenched by immediately 
putting the plant material into liquid nitrogen after harvesting. 
Deep-frozen leaf material was ground to a fine powder with a 
pesde and mortar under steady addition of liquid nitrogen and 
subsequently stored at — 80°C before measurement. 



Chemicals 

Methanol (HPLC-grade), chloroform (anhydrous, >99%, p.a.) 
and acetonitrile (UHPLC-grade) were purchased from Sigma- 
Aldrich (Vienna, Austria). Formic acid (98-100%) was purchased 
from Merck (Vienna, Austria). Chloramphenicol (>98%) and 
Ampicillin trihydrate (analytical standard) were purchased from 
Fluka (Vienna, Austria). 

Extraction procedure and sample preparation for 
secondary metabolite analysis 

For LC-MS analysis, about 50 mg of frozen plant-leaf material 
was extracted by 1 ml pre-chilled 80/20 v:v MeOH/H 2 0 solution 
containing 1 u.g of the internal standards Ampicillin and 
Chloramphenicol. Samples were centrifuged at 15.000 g for 15 
minutes. The supernatant was dried out overnight in a new tube 
and re-dissolved in 100 (0.1 of 50/50 v:v MeOH/H 2 0 solution and 
centrifuged again for 15 minutes at 20.000 g. The supernatant was 
then filtered through a STAGE tip (Empore/Disk CI 8, diameter 
47 mm) before it was conveyed into a GC vial with a micro insert 
tip. Plant extracts were further extracted with 500 u.1 of chloroform 
to remove the highly abundant lipid components. The LC-MS 
method for secondary metabolite analysis has been described 
before [6]. 

Data processing, mzGroupAnalyzer and pathway viewer 

Both mzGroupAnalyzer and Pathway Viewer have been integrated 
into the GUI of the CO VAIN toolbox [16]. The standalone 
version of COVAIN can be downloaded at http://www.univie.ac. 
at/mosys/software.html. The data processing strategy and subse- 
quent analysis of the data using mzGroupAnalyzer and Pathway viewer 
are explained in the mzGroupAnalyzer-Tutorial (Figure SI), m/z- 
values acquired by LC-MS were exported to Excel data sheets 
using the Xcalibur software. Elemental composition determination 
was enabled, with a maximum of 10 possible sum formulas for 
each compound and a ppm deviation of 1 . 

The single excel-sheets can be uploaded to mzGroupAnalyzer via 
the GUI of COVAIN. Furthermore, a user-defined rules-file is 
uploaded and the folder for storage of the result-files is provided. 
By starting the mzGroupAnalyzer, the lists of mlz values with 
associated chemical compositions from Xcalibur output are read 
and the atomic differences between m/z-pairs are calculated and 
compared with the putive chemical modifications provided by the 
rules-file. 

Based on all chemical modifications provided by the rules-file, 
mzGroupAnalyzer searches pathways between pairs of ml ' z features. 
Regarding each ml z feature as a node in a metabolic network, an 
edge connects two nodes if a chemical modification exists. These 
edges and nodes generate large networks. A pathway can therefore 
be constructed by searching the shortest path between two nodes. 
Redundant paths that are included in other longer paths are 
removed. The pathway searching algorithm can deal with time 
series data by filtering out pathways that do not reflect the correct 
chronological order of the given measurements. For example, 
there is no path from mlz feature A occurring on Day 2 to mlz 
feature B occurring on Day 1, although theoretically a chemical 
modification from A to B is possible. Finally, to better visualize the 
pathways, we further developed Pathway Viewer, which is integrated 
in mzGroupAnalyzer, and is able to plot the pathways after a series of 
user-defined filtering options like ml z range or time points. The 
following result files will be created, exported and saved: 

1 . Transformations corresponding to the rules-file. 

2. A ranking of frequendy found chemical modifications. 
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3. Putative transformations that were not listed in the rules-file. 

4. A mzStructure file for Pathway viewer. 

5. A Pajek-file (httpi/Adado.fmf.uni-lj.si/pub/networks/pajek/) 
for network visualization. 

Next the Pathway viewer is started and a table of transformations 
and the possibility for visualization of pathways is provided (see 
Figure SI - mz.GroupAnalyz.er- Tutorial). The above process is 
summarized in Figure 1. The list of chemical and biochemical 
reactions searched by mzGroupAnalyzer (currently comprising 56 
metabolic reaction steps in the provided rules-file) is shown in 
Table SI. This list can be easily extended to novel transformations. 

For the construction of the van Krevelen plots, sum formulas 
with mzGroupAnalyzer-predicted metabolic transformations were 
assorted in an Excel sheet and their O/C and H/C ratios were 
calculated. These values were exported to SigmaPlot 12.3 and 
mapped to a multiple scatter plot within the boundaries 0 to 1 for 
the O/C ratio and 0 to 2 for H/C ratio. 



Results and Discussion 

Development of the mzGroupAnalyzer algorithm to 
identify biochemical and chemical transformations in 
non-targeted metabolomics data 

The development and application of algorithms is essential to 
systematically search for biochemical and chemical transforma- 
tions of compounds and to find putative pathways in highly 
complex LC-MS based metabolomics data. We have established 
an algorithm which is able to extract putative chemical 
transformations from high mass accuracy metabolome data. The 
algorithm generates multiple potential pathways directly from raw- 
LC-MS data and visualizes these as networks. The entire approach 
is implemented as a graphical user interface (GUI) in COVAIN 
(Figure 1). COVAIN is a toolbox for statistical data mining in 
metabolomics and other OMICS approaches [16] in the Matlab 
environment. In order to evaluate the algorithm, we measured 
various reference compounds of typical plant secondary metabo- 
lites with nano-UPLC-Orbitrap-MS. Subsequendy, sum formulas 
were generated in Xcalibur 2.0 using the integrated sum formula 



Read m/z data 
and reaction 
rules 





Figure 1. Scheme of the mzGroupAnalyzer and Pathway Viewer algorithm and GUI implementation. The program reads the m/z features 
which are extracted from Xcalibur, as well as the user predefined reaction rules. Then it finds transformations between all pairs of m/z features, and 
reports the frequency of transformations for the listed and not listed but potentially existing rules. Next, the program starts searching pathways 
inside the m/z features' network. A shorter path existing in other longer paths is removed, thereby non-redundant pathways are obtained. Then, 
mzGroupAnalyzer opens the Pathway Viewer, in which pathways satisfying user-defined filtering options will be displayed on the panel. The pathway 
diagram, which consists of reaction rules, m/z feature names, compositions and time points, can be plotted by clicking the table. Finally, all the 
results, including the frequency table of transformations, the interconnected network visualization file (in Pajek's format), the inferred pathways and a 
Matlab workspace (suffixed with mzStruct.mat) containing all results, will be exported to the user-specified folder. 
doi:10.1371/journal.pone.0096188.g001 
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Figure 2. mzGroupAnalyzer is able to detect possible metabolic steps out of various proposed sum formulas for a measured m/z 
feature. A Kaempferol and Quercetin standards measured by LC-MS result into several sum formula suggestions for the measured mass-to-charge- 
ratio (m/z). Because a low ppm deviation of assigned elemental composition to the mass is not the decisive factor, the correct sum formula might not 
be the first one proposed and thus several must be looked at, which is handled by the program automatically. If mzGroupAnalyzer finds a possible 
reaction step (out of a list of reactions which can be altered manually), it is reported to the user. B MS 2 spectra of Kaempferol (left) and Quercetin 
(right). The fragmentation schemes are in accordance with published literature [42,43], The difference of one oxygen atom (nominal mass 16) is 
visible in the fragments m/z 213^229 and 241^257, while m/z 165 occurs in both product scans. 
doi:1 0.1 371 /journal.pone.00961 88.g002 



calculator. Within a 1 ppm mass accuracy window and using the 
monoisotopic masses of several common elements, possible sum 
formulas were obtained (Figure 2 and Figure SI- mzGroupAnalyzer- 
Tutorial). Many of the sum formulas did not result in feasible 
chemical structures but had a very high mass accuracy according 



to the measured compound, thus leading to false positives. As a 
consequence the correct sum formula prediction was not among 
the top first hits. The fact that high mass accuracy alone is not 
sufficient for correct sum formula prediction has been shown 
before [17,18]. This is especially true for metabolites which often 



PLOS ONE | www.plosone.org 



4 



May 2014 | Volume 9 | Issue 5 | e96188 



Automated Data Analysis in LC-MS Based Metabolomics 




0.0 0.2 0.4 0.6 0.8 1.0 



O/C 



O/C vs H/C unstressed 
• O/C vs H/C cold stressed 



Figure 3. After oxidative stress the Arabidopsis thaliana plants turn from green into purple indicating a dramatic shift in metabolism, 
specifically elevated flavonoid biosynthesis involved in oxidative stress protection [6]. A Plants turns from green to purple under high 
light and cold temperature treatment. B Van Krevelen diagram of the most abundant m/z values of unstressed (green dots) and 20-day cold stressed 
(purple dots) Arabidopsis plants. A clear shift of metabolism in the stressed plants is visible. 
doi:10.1371/journal.pone.0096188.g003 



contain elements like S and P, or even halogens. The utilization of 
electrospray ionization tends to produce Na + , K + and other 
adducts in positive ionization mode. Because mzGroupAnalyzer looks 
for putative chemical transformations of compounds the consid- 
ered number of sum formulas can be reduced. The application of 
mzGroupAnalyzer revealed metabolic conversions of the protonated 
reference compound such as the addition of a hydroxyl group to 
Kaempferol leading to Quercetin (Figure 2). Due to this chemical 
transformation the sum formula pair of Kaempferol and 
Quercetin is automatically detected. Indeed, this reaction occurs 
in the flavonol biosynthetic pathway catalyzed by a flavonoid 3'- 
monooxygenase. The constraints for detectable metabolic reac- 
tions (e.g.+lC+2H denotes a net methylation) are uploaded to the 



program before performing the analysis and can be customized by 
the user (Figure SI - mzGroupAnalyzer-Tutorial). mzGroupAnalyzer is 
also able to recognize the frequency of equidistant steps between 
m/z values and exports these data as suggestions for novel 
modifications (see Methods section). Furthermore, whole series of 
reaction steps in the data can be detected and analyzed in the 
context of metabolic pathways. This will be discussed in the next 
sections. 

Cold- and light-induced stress has a dramatic effect on 
the Arabidopsis thaliana metabolome 

To test the performance of the mzGroupAnalyzer algorithm we 
designed the following experiment. Arabidopsis thaliana plants were 
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Figure 4. Exploration of the van Krevelen diagram created by sum formulas with chemical transformations detected by 

mzGroupAnalyzer. A m/z 1 181, 1 195, 1 197 and 121 1 are interconnected with net shifts of 'A 0 2 and a CH 2 group and form a rhombic pattern. B 
Proposed fragmentation scheme of these compounds under the chosen conditions. C Product ion scans show similar fragmentation behavior of the 
polysubstituted anthocyanins. The spectrum of m/z 1195 shows a peak at m/z 549, pointing to a methyl group at the cyanidin core. A putative 
methylation site is shown. 
doi:1 0.1 371 /journal.pone.00961 88.g004 



exposed to excess irradiation in a 4°C environment. This 
treatment is described in the literature to produce high levels of 
reactive oxygen species (ROS) [19-21]. ROS, such as hydrogen 
peroxide (H 2 0 2 ), hydroxyl radicals (OH), superoxide anion (O2 ) 
and singlet oxygen ( 0 2 ), are unavoidable by-products of 
photosynthetic organisms occurring in organelles with a high 
oxidative turnover rate during normal metabolic activity and can 
be highly damaging for cells and tissues under stress [13,22,23]. 
These stress conditions require an effective scavenging system in 
order to prevent the organism from being damaged by free 
radicals [24]. Especially biomolecules from the phenylpropanoid 
family, such as flavonoids and anthocyanins, have been recognized 
as effective radical-scavenging compounds [25,26]. After several 
days of stress in our experimental systems, the plants began to 
produce violet pigments in the rosette leaves (Figure 3A). The 



purple color is the result of the accumulation of anthocyanidins as 
a response to oxidative stress [25-28]. The enhanced production 
of anthocyanidins under these stress conditions requires a large- 
scale metabolic reprogramming as recently described by us [6]. 
Leaf extracts of the cold/light-treated Arabidopsis thaliana plants 
were analyzed with LC/MS. From these analyses sum formulas of 
putative metabolites were generated based on the acquired m/z 
values focusing only on C, H and O elements (see also Figure SI — 
mzGroupAnalyzer-TulonsX). The generated sum formulas were 
loaded into a van Krevelen plot to visualize the chemical and 
biochemical transformations of metabolites during cold and light 
stress (Figure 3B). Oxidative shifts in the plot induced by cold- 
related stress might be explained by the incorporation of oxygen, 
as well as radical scavenging by redox-active compounds, such as 
aromatic hydroxyl groups which can stabilize radicals after 
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Figure 5. Identification of biochemical transformations of in vivo data using mzGroupAnalyzer. A metabolic pathway leading to a putative 
new compound m/z 1121 is revealed. Amongst several hundreds of other interlinked m/z values in the data, the figure shows metabolic transitions 
derived from sub-ppm accuracy measurements on the left side and their corresponding MS 2 product ion scans on the right. Comparison of the 
spectral information from step to step reveals the possible location of metabolic structural changes. Stereochemistry is assumed due to literature 
findings [33]. 

doi:1 0.1 371 /journal.pone.00961 88.g005 



deprotonation. Van Krevelen plots were originally introduced to 
characterize carbon-based resources like mineral oils and coal 
according to their possible chemical composition acquired by 
high-resolution mass spectrometry [29,30]; only recently, van 
Krevelen plots have been applied as useful tools for visualization of 
metabolic processes and pathways [31] as well as for sum formula 
annotation of natural organic matter (NOM) [32] . 

The investigation of van Krevelen plots composed of untargeted 
metabolomics data can validate structural familiarity between 
compounds through a metabolic pattern as depicted in Figure 4. 
In the O/C ratio range from 0.51 to 0.55 and H/C range of 1.036 
to 1.056, the chemical relation of compounds m/z 1181, 1195, 
1197 and 1211 is visible (Figure 4A), while their structures are 
validated by MS 2 product ion scans (Figure 4B and 4C). These 
compounds were also automatically detected by the mzGroupAna- 
lyzer approach when investigating the putative chemical and 
biochemical transformations from the generated sum formula hits 
(see also Figure SI - mzGroupAnalyzer-Tutorial). In the following 
section we explored the potential of mzGroupAnalyzer to reveal full 
pathways leading to novel structures. 



Broad-scale analysis of metabolic conversions and novel 
structure prediction by mzGroupAnalyzer in the cold/light 
stress metabolome of Arabidopsis thaliana 

Time-dependent sampling of Arabidopsis thaliana leaf samples in 
cold stress yielded strong alterations in the metabolic profiles (see 
above). All precursor ions from the LC/MS analysis obtained from 
one sampling time point were assigned to ten possible sum 
formulas with the following parameters: 100 max C, 200 max H, 
50 max O, 10 max N, 1 ppm maximum deviation from suggested 
sum formula. Data from all time points were exported from 
Xcalibur 2.0 software as single Excel sheets and uploaded into 
mzGroupAnalyzer using the graphical user interface (see above and 
methods; see Figure SI - mzGroupAnalyzer-Tutorial). Further, a 
user-defined rules-file of the molecular shifts of chemical and 
biochemical transformations needs to be uploaded via the GUI. 
Here, we provide a rules-file with 56 reactions. By starting the 
mzGroupAnalyzer via the GUI, the algorithm detects metabolic 
transformations between pairs of m/z values in the uploaded list of 
suggested sum formulas. A table of detected pathways is generated 
using the GUI option "Pathway Viewer". Individual pathways can 
be visualized by clicking on the corresponding cell. Furthermore, 
the whole pathway network can be exported as a Pajek-file for 
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Table 1. Putative compounds including their mzCroupAnalyzer- predicted sum formulas, the corresponding exact mass as well as 
dominant MS 2 product ion fragments. 





name or m/z 
value 


sum formula detected by 

mzGroupAnalyzer 


exact mass 


main fragments 


mass accuracy (ppm) 


reference 


A1 


C32H39O20 


743.20292 


287, 581 


0.35 


Tohge et al. [33] 


A2 


C35H41O23 


829.20331 


287, 535, 581, 785 


0.29 


Tohge et al. 


A3 


C41 H45O22 


889.23970 


287, 449, 727 


-0.11 


Tohge et al. 


A4 


C43H49O24 


949.26083 


287, 449, 787 


0.48 


Tohge et al. 


A5 


C44H47O25 


975.24009 


491, 535, 727, 931 


-0.23 


Tohge et al. 


A6 


C47H55O27 


1051.29252 


449, 889 


-0.27 


Tohge et al. 


A7 


C52H55O26 


1095.29761 


449, 933 


0.41 


Tohge et al. 


A8 


C50H57O30 


1137.29292 


491, 535, 889 


-0.13 


Bloor & Abrahams [40] 


A9 


C55H57O29 


1181.29800 


491, 535, 933, 1137 


-0.28 


Bloor & Abrahams 


A10 


C58H65O31 


1257.35043 


449, 1095 


-0.07 


Tohge et al. 


All 


Q1 H67O34 


1343.35083 


491, 535, 1095, 1299 


0.51 


Bloor & Abrahams 


A12 


C26H29O15 


581.15010 


287, 449 


0.43 


Tohge et al. 


A13 


C35H35O17 


727.18688 


287, 581 


0.23 


Tohge et al. 


A14 


C37H39O19 


787.20801 


287 


0.21 


Tohge et al. 


A15 


C 41 H45O22 


889.23970 


287 


-0.14 


Tohge et al. 


A16 


^46^45^21 


933.24478 


287, 727 


0.28 


Tohge et al. 


A17 


C52H55O26 


1095.29761 


449, 933 


0.56 


Tohge et al. 


919 


C42H47O23 


919.25026 


287, 757 


0.16 


- 


991 


C44H47O26 


991.23501 


491, 535, 743, 947 


0.17 


- 


1005 


C45H49O26 


1005.25066 


491, 535, 757, 961 


0.28 


- 


1035 


C46H51O27 


1035.26122 


491, 535, 787, 991 


0.31 


- 


1065 


C51H53O25 


1065.28704 


449, 903 


0.29 


- 


1081 


C51H53O26 


1081.28196 


449, 919 


0.68 


- 


1111 


C52H55O27 


1111.29252 


449, 949 


0.36 


- 


1121 


C53H53O27 


1121.27687 


491, 535, 873, 1077 


0.13 


- 


1125 


C53H57O27 


1125.30817 


449, 963 


0.51 


Saito et al. [39] 


1151 


C54H55O28 


1151.28744 


491, 535, 903, 1107 


0.48 


- 


1167 


C54H55O29 


1167.28235 


491, 535, 919, 1123 


0.54 


Kasai et al. [41] 


1195 


C56H59O29 


1195.31365 


505, 549, 947, 1151 


0.49 




1197 


C55Hs7O 30 


1197.29292 


491, 535, 949, 1153 


0.28 


Saito et al. 


1211 


C56H59O3O 


1211.30857 


491, 535, 963, 1167 


0.05 


Saito et al. 


1313 


^60^65^33 


1313.34026 


491, 535, 1065, 1269 


0.13 




1329 


Q0H65O34 


1329.33518 


491, 535, 1081, 1285 


0.43 




1359 


Q1H67O35 


1359.34574 


491, 535, 1111, 1315 


0.58 




1373 


Q2H69O35 


1373.37156 


491, 535, 1125, 1329 


0.67 




1549 


C72H77O38 


1 549.40873 


535, 1301, 1505 


0.57 





The nomenclature 
doi:1 0.1 371 /journal 



s according to [33]. Compounds m/z 1125, 1197 and 1211 were found in Matthiola incana by [39]. 
pone.0096188.t001 



visualization (see Figure SI - mzGroupAnalyzer-TvAonsl and 
Figure 1). Over the course of the cold and light stress, several 
highly frequent reactions were identified. Overall, the 10 
predominant reactions were mono-oxygenations, followed by 
hydrogenations, hydrations, methoxylations, methylations, oxido- 
reductions, oxidations, malonylations, hexosylations, and dihy- 
droxylations of double bonds. Due to possible false positives within 
the detected metabolic steps and reactions, these reaction lists have 
to be carefully validated manually and present preliminary results. 
More evaluation of the mzGroupAnalyzer and further proof-of- 
concept studies are needed in the future. 



Nevertheless, mzGroupAnalyzer reported many reactions, which 
were validated to be metabolic pathways: By investigating the 
information from the MS 2 spectra as well as comparing them with 
published literature (e.g. [33], see Table 1), the presence of several 
compounds of the anthocyanin family in the Arabidopsis plants was 
revealed. Additionally, the program reported metabolic steps from 
these compounds to previously undescribed m/z features, which 
are suspected to be new anthocyanins. In Figure 5, it is shown how 
a metabolic pathway is extracted by mzGroupAnalyzer. The 
prediction of the pathway is validated by corresponding MS 2 
product ion spectra from the same data set. This pathway leads to 
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Figure 6. Structure validation of m/z 1121 by WIS 3 product ion scans. Both MS 2 fragments m/z 535 and 873 result in the core cyanidin 
structure by undergoing MS 3 fragmentation, m/z 1077, the putatively decarboxylized form of 1 121, yields m/z 491, as observed in the MS 2 spectrum 
already, by scission of the 3-O-glycosidic bond. Fragment m/z 873 arises again from the breaking of the glycosidic bond at 5-0. m/z 1017 would 
comply with the complete removal of the rest of the former malonyl group together with a water loss (—60 u). A putative structure is given. 
doi:10.1371/journal.pone.0096188.g006 



a putatively novel compound m/z 1121 which was detected 
automatically by mzGroupAnalyzer: 

mzGroupAnalyzer detected reactions between the m/z signals 287, 
449, 595, 727, 889, 975 and 1121. m/z 287, the core cyanidin 
structure, is hexosylated to structure m/z 4-49, which itself shows 
the intact cyanidin structure in its MS 2 product ion scan and 
which has been reported as cyanidin 5-O-glycoside [33]. A 
coumaroyl group is then added to the sugar group in m/z 4-49, 
resulting in compound m/z 595, while showing the previous two 
compounds in the MS 2 spectrum. To m/z 595, a 5-C sugar is then 
added to the already existing hexose group - judged from the MS 2 
spectrum - forming compound m/z 121 . In the MS 2 spectrum of 
m/z 727 ("A13") only m/z 595 is visible, indicating that the sugar- 
sugar bond is more prone for collision-induced dissociation (CID) 
than the hexose-coumaroyl bond. Next, another metabolic shift of 
+6C, +10H and +50 from m/z 121 to m/z 889 ("A3") was 
detected by mzGroupAnalyzer, which indicates a hexosylation 
reaction. Indeed, MS 2 fragmentation scans again showed peaks 
at m/z 287 and m/z 449, as well as m/z 121 itself in the spectrum, 
proving our assumptions that CID is happening in both positions, 
3-0 as well as 5-0. From m/z 889 to m/z 975, a malonylation step 
was detected. In the product ion scans, the fragment m/z 727 is 
again present, as well as a new peak at m/z 535, with m/z 449 
nearly disappearing, m/z 975 has been reported as molecule "A5" 
before [33], and coincides with all our findings both in the 
metabolic route as well as in the MS 2 spectra. The last step in this 
reaction list was detected to be a coumaroylation from m/z 975 to 
m/z 1121. Again, fragment 535 is found in the MS 2 scan, while 
now a higher peak, m/z 873, emerges. We assume this peak is the 
structure of m/z 121 with another coumaroyl group in position 2 
of the xylose, as this position tends to carry further groups. The 
peaks at m/z 1077 and m/z 931 in the MS 2 product ion scan of m/ 
Z 1 121 and m/z 975, respectively, correspond to a decarboxylation 
of the malonyl group in the native molecule. Furthermore the m/z 
491 in these two MS 2 product ion scans supports the decarbox- 
ylation reaction of m/z 535. Figure 6 shows additional MS 5 



product ion scans of the isolated MS 2 product ions m/z 535, 873 
and 1077. Both MS 2 peaks m/z 535 and 873 result in the core 
cyanidin structure m/z 287 by undergoing MS 3 fragmentation, m/ 
Z 1077, the putatively decarboxylized form of m/z 1 121, yields m/z 
491, as observed in the MS 2 spectrum already, by CID-cleavage of 
the 3-O-glycosidic bond. Fragment m/z 873 arises again from the 
CID-cleavage of the glycosidic bond at 5-0. m/z 1017 is a putative 
structure generated by CID cleavage of an acetyl-group together 
with a water loss (—60 u). These findings lead us to propose m/z 
1121 as compound "[Cy+2Glc+Xyl+2Cou+Mal] + ", or systemat- 
ically, cyanidin 3-0-[2"-0-(6'"-0-(p-coumaroyl) xylosyl) 6"-0-(p- 
O-(glucosyl)-p-coumaroyl) glucoside] 5-0-(6""-0-malonyl) gluco- 
side. The resulting putative structure is shown in Figure 5. 

mzGroupAnalyzer detected a differential appearance of various 
anthocyanidins (see Table 1) during the different time points, m/z 
287 was detected in all time points, m/z 449 after 2 days, and 595, 
727, 889, 975 and 1121 only after 4 days of oxidative stress. 
Following this strategy, 15 new compounds in Arabidopsis thaliana 
could be proposed by investigating the mzGroupAnalyzer pathway 
suggestions together with their product ion spectra m/z. Putative 
compounds with their mzGroupAnalyzer- predicted sum formulas, 
the corresponding exact mass as well as dominant MS 2 product 
ion fragments are summarized in Table 1. 

Using these new substances in combination with confirmed 
compounds from the literature, a network of the anthocyanin 
family starting with the KEGG pathway was reconstructed 
(Figure 7). Product ion scans of the new compounds and then- 
reconstructed structures can be found in the Figure S2. 

Conclusions 

In this study, we showed that the application of mzGroupAnalyzer 
- a novel algorithm for untargeted identification of chemical 
modifications in metabolome data - on time-dependent, high- 
throughput LC-Orbitrap-FT-MS metabolomic profiles can give 
new insights into biochemical pathways, and combined with MS" 
scans has the power to validate known compounds and predict 



PLOS ONE | www.plosone.org 



9 



May 2014 | Volume 9 | Issue 5 | e96188 



Automated Data Analysis in LC-MS Based Metabolomics 




1005 



Figure 7. A proposed network of the detected anthocyanin family featuring putatively novel compounds as well as known 
structures including the KEGG pathway of anthocyanin biosynthesis [38,44]. Only compounds from the KEGG anthocyanin pathway are 
depicted, for which a suitable precursor mass was found in the data. Exact masses, sum formulas, and main MS 2 fragments of the new compounds 
are compiled in Table 1; reconstructed structures together with MS 2 scans are in the supporting information. The network was created with VANTED 
[45,46]. 

doi:1 0.1 371 /journal.pone.00961 88.g007 



new structures. Attempts at assigning sum formulas to highly 
resolved metabolomic data have been done before and have 
proven to be very fruitful [34,35], yet it is clear that those 
approaches ultimately have to be automatized. The visualization 
of the elemental composition of metabolites on a van Krevelen 
diagram is useful for recognition of metabolic patterns, which can 
point to a structural similarity between those molecules. 

The unambiguous structural assignment and stereochemical 
elucidation of a compound is always a time-intensive process and 
demands further validation from other analytical platforms, such 
as NMR [7] . In our approach, we circumvent some of these issues 
by capturing metabolic intermediates of a compound in the data 
set, thus creating an additional layer of information. The 
complementary analysis of product ion spectra associated with 
the predicted chemical modifications of the precursor m /^-ratios 
introduces novel aspects for structural elucidation of unknown 
metabolites, and enables further validation. While the issues of 
unambiguous sum formula annotation in high-resolution LC-MS 
metabolomics still remain, and further application and validation 



of other analytical techniques are needed, mzGroupAnalyzer proves 
to be a convenient tool for tracking metabolic changes, thus sum 
formulas, and inferring metabolic pathways from time-series data, 
leading to the prediction of entirely novel, hitherto undescribed 
compounds. Furthermore, mzGroupAnalyzer is able to handle time- 
series data and is thus able to identify time-dependent chemical 
modifications. Finally, mzGroupAnalyzer is connected with the 
Pathway Viewer which plots corresponding pathways in a user 
friendly way. 

Several strategies will be implemented in future to further 
improve the algorithm. First, the transformation frequency will be 
used in future to rank sum formulas corresponding to the same ml 
Z feature. Secondly, more strict sum formula filtering criteria will 
be applied in selecting correct sum formulas from ml ' z features 
[36]. Thirdly, more reaction rules, and how these rules are 
connected in a pathway, will be learned from metabolic pathway 
databases ([37,38] for KEGG and MetaCyc). 
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Supporting Information 

Figure SI "SI mzGA tutorial, pptx"; tutorial for mzGroupAna- 

lyzer in pptx format. 

(PPTX) 

Figure S2 "S2 MS2 putative cyanidins.pptx"; recorded MS2 
product ion scans of the putative new cyanins in pptx format. 
(PPTX) 

Table SI Rules file for chemical and biochemical transforma- 
tions. 
(XLSX) 
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